Noise in text can be defined as any kind of difference between the surface form of a coded representation of the text and the intended, correct, or original text.
Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this text used in such discourses.
Gartner estimates that unstructured data constitutes 80% of the whole enterprise data. A huge proportion of this unstructured data comprises chat transcripts, emails and other informal and semi-formal internal and external communications.
Usually such text is meant for human consumption. However, now with huge amounts of such text being present, both online and within the enterprise, it is important to mine such text using computers.
There are many spell checkers and grammar checkers available today. Many word processors like MS Word include this in the editing tool. Online, Google in its search interface tries to include a correction engine to guide users when they make mistakes with their queries.